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Abstract 

We present a near-optimal reduction from approximately counting the cardinality 
of a discrete set to approximately sampling elements of the set. An important appli- 
cation of our work is to approximating the partition function Z of a discrete system, 
such as the Ising model, matchings or colorings of a graph. The typical approach to 
estimating the partition function Z(/3*) at some desired inverse temperature (3* is to 
define a sequence, which we call a cooling schedule, [3q = < (3\ < ■ ■ ■ < = /?* 
where Z(Q) is trivial to compute and the ratios Z (/3i+i) / Z '(Pi) are easy to estimate by 
sampling from the distribution corresponding to Z(Pi). Previous approaches required 
a cooling schedule of length 0*(ln A) where A = Z(0), thereby ensuring that each ratio 
Z (/3i+i) / Z (Pi) is bounded. We present a cooling schedule of length I = 0*(\J\a.A). 

For well-studied problems such as estimating the partition function of the Ising 
model, or approximating the number of colorings or matchings of a graph, our cooling 
schedule is of length 0*(y / n), which implies an overall savings of 0*(n) in the running 
time of the approximate counting algorithm (since roughly £ samples are needed to 
estimate each ratio). 

A similar improvement in the length of the cooling schedule was recently obtained 
by Lovasz and Vempala in the context of estimating the volume of convex bodies. 
While our reduction is inspired by theirs, the discrete analogue of their result turns 
out to be significantly more difficult. Whereas a fixed schedule suffices in their setting, 
we prove that in the discrete setting we need an adaptive schedule, i. c., the schedule 
depends on Z. More precisely, we prove any non-adaptive cooling schedule has length 
at least 0*(ln^4), and we present an algorithm to find an adaptive schedule of length 
0*(VhTA). 

1 Introduction 

This paper explores the intimate connection between counting and sampling problems. 
By counting problems, we refer to estimating the cardinality of a large set (or its weighted 
analogue), or in a continuous setting, an integral over a high-dimensional domain. The 
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sampling problem refers to generating samples from a probability distribution over a large 
set. The well-known connection between counting and sampling is the starting point for 
popular Markov chain Monte Carlo methods for many counting problems. Some notable 
examples from computer science are the problems of estimating the volume of a convex 
body [5J [TTJ and approximating the permanent of a non- negative matrix [U] . 

In statistical physics, a key computational task is estimating a partition function, which 
is an example of a counting problem. Evaluations of the partition function yield estimates 
of thermodynamic quantities of interest, such as the free energy and the specific heat. 
The corresponding sampling problem is to generate samples from the so-called Gibbs (or 
Boltzman) distribution. 

We present an improved reduction from approximate counting to approximate sam- 
pling. These results improve the running time for many counting problems where efficient 
sampling schemes exist. We present our work in the general framework of partition func- 
tions from statistical physics. This framework captures many well-studied models from 
statistical physics, such as the Ising and Potts models, and also captures many natu- 
ral combinatorial problems, such as colorings, independent sets, and matchings., For the 
purpose of this paper we define a (discrete) partition function as follows. 

Definition 1.1. Let n > be an integer. Let ao, ■ ■ ■ ,a n be non-negative real numbers 
such that ao > 1. The function 

Z(/J)= J>e-tf 

is called a partition function of degree n. Let A := Z(0). 

This captures the standard notion of partition functions from statistical physics in the 
following manner. The quantity i corresponds to the possible values of the Hamiltonian. 
Then aj is the number of configurations whose Hamiltonian equals i. For instance, in the 
(ferromagnetic) Ising model on a graph G = (V,E), a configuration is an assignment of 
+ 1 and —1 spins to the vertices. The Hamiltonian of a configuration is the number of 
edges whose endpoints have different spins. The quantity j3 is referred to as the inverse 
temperature. The computational goal is to compute Z{[3) for some choice of (3 > 0. Note, 
when (3 = the partition function is trivial since Z(0) = £™=o a * = 2l y L The condition 
ao > 1 is clearly satisified, in fact, we have ao = 2 by considering the all +1 and the all 
— 1 configurations. 

The general notion of partition function also captures standard combinatorial counting 
problems as illustrated by the following example. Let f2 be the set of all /c-labelings of 
a graph G = (V, E) (i. e., labelings of the vertices of G by numbers {1, . . . , k}). Given a 
labeling a, let its Hamiltonian H (<r) be the number of edges in E that are monochromatic 
in cr. Let Oj denote the set of all &-labelings of G with H(o~) = i. Let ai = |Oj|. We would 
like to compute Z{oo) = ao, i.e., the number of valid fe-colorings of G. Once again, the 
case = is trivial since we have Z(0) = k^ v K The condition ao > 1 simply requires that 
there is at least one proper fc-coloring. 

The standard approach to compute Z{j3) is to express it as a telescoping product of 
ratios of the partition function evaluated at a sequence of /3's, where the initial (3 = is 
the trivial case. The ratios are approximated using a sampling algorithm. More precisely, 
consider a set of configurations f2 which can be partitioned as = Qq U fii U • • • U Q n , 
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where |fi$| = Oj for < 2 < n. Suppose that we have an algorithm which for any inverse 
temperature (3 > generates a random configuration from the distribution \i$ over 
where the probability of a configuration a G f2 is 

e -PH(<r) 

= -^p (l) 

where is the Hamiltonian of the configuration defined as 

H(a) = z such that a G Jlj. 

We now describe the details of the standard approach for using such a sampling algorithm 
to approximately evaluate Z((3). In the general setting of Definition 11.11 for X ~ fj,p, the 
random variable 

W P j, := e^'W (2) 
is an unbiased estimator for Z{j3') / Z{(5). Indeed, 

Thus, ao = Z{oo) can be approximated as follows. Take Pq < Pi < ■ ■ ■ < Pi with /3q = 
and Pi = oo. Express Z(oo) as a telescoping product 

Z oo = Z(0 )————... — — 4) 

and approximate each fraction in the product using the unbiased estimator Wp it /3 i+1 . The 
initial term Z(0) is typically trivial to compute. 

Taking sufficiently many samples for each Wj3 it p i+1 will give a good approximation of ao- 
The question we study in this paper is: how should one choose the inverse temperatures 
/3q, ...,/% so as to minimize the number of samples needed to estimate A specific 

choice oi (3o, . . . , is called a cooling schedule. 

In the past, MCMC algorithms have used cooling schedules that ensure that each ratio 
in the telescoping product is bounded by a constant. Dyer and Frieze [2] used a non-trivial 
application of Chebyshev's inequality to show that 0(£) samples per ratio are sufficient to 
obtain an (1 ± e) approximation of Z(oo). 

For applications such as colorings or Ising model, requiring that each ratio is at most 
a constant, implies that the length of the cooling schedule is at least Q(n), since Z(0) and 
Z(oo) typically differ by an exponential factor. A general cooling schedule of length 0*(n) 
was presented in Bezakova et al pQ. All schedules prior to our work use non-adaptive 
cooling schedules. By non-adaptive we refer to a schedule that depends only on n and A 
but not the structure of Z. 

The recent volume algorithm of [111 [T2] uses a non-adaptive cooling schedule of length 
0(y/n) to estimate the volume of a convex body in M n . Their result relies on the logcon- 
cavity of the function p n Z{P) where Z is the analogue of the partition function. 

Here we present a cooling schedule for discrete partition functions with length roughly 
y/hiA where A = Z(0). (Note, Vhi A is roughly yfn in the examples we have been 
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considering here). The discrete setting presents the following new challenge. As we show 
in this paper, there can be no short non-adaptive cooling schedule for discrete partition 
functions. Any such non-adaptive schedule has length 0(ln^4) in the worst case. (We 
defer precise statements of our results until we formally present the background material.) 

Our main result is that every partition function does have an adaptive schedule of 
length roughly vha A. Further, the schedule can be figured out efficiently on the fly. 

The existence of a short schedule follows from an interesting geometric fact: any convex 
function / can be approximated by a piecewise linear function g consisting of few pieces, 
see Figure [T] in Section |4] for an illustration. More precisely, / is approximated in the 
following sense: for all x > 0, we have < g{x) — f(x) < 1. 

For well-known problems such as counting colorings or matchings, and estimating the 
partition function of the Ising model, our results imply an improvement in the running 
time by a factor of n, since the complexity grows with the square of the schedule length; 
see Section [8] for a precise statement of the applications of our results. 

We observe (in Section 14. lj) that our techniques apply to the continuous setting as 
well, specifically, to the integration of general functions in M. n . The key property required 
for the existence of an adaptive schedule is the logconvexity of the partition function 
Z(J3). However, this does not immediately lead to any new algorithms for integration 
since logconcave functions are the most general class of continuous functions for which we 
have efficient sampling algorithms. 

In Section [2] we formalize the setup described in this introduction. The lower bound 
for non-adaptive schedules is formally stated as Lemma 13.31 in Section [3l The existence of 
a short cooling schedule is proved in Section HI and formally stated in Theorem 14.11 The 
algorithm for constructing a short cooling schedule is presented in Section Finally, in 
Section [8] we present applications of our improved cooling schedule. 



2 Chebyshev cooling schedules 

Let W := Wp t pi be the estimator defined by ([2]) whose expectation is a individual ra- 
tio in the telescoping product. As usual, we will use the squared coefficient of variance 
Var (W) /E (W) 2 as a measure of the quality of the estimator W, namely to derive a bound 
on the number of samples needed for reliable estimation of E (W). We will also use the 
quantity E ( W 2 ) /E (W) 2 = 1 + Var (W) /E ( W 2 ) . 

Lemma 2.1 (Chebyshev). Let W be a random variable with E(W) < oo and E (W 2 ) < 
oo. Let e > 0. We have 

Var (WO E(W 2 ) 

P((l - e)E (W) < W < (1 + e)E (W)) > 1 - 1 ' > 1 - 

U ' K " V ' V " ~ e 2 E(W) 2 ~ e 2 E{W) 2 

The following lemma of Dyer and Frieze [2] is now well-known. 

Theorem 2.2. Let W±, . . . , We be independent random variables with E (W 2 ^) /E (Wi) 2 < 
B for i G [£]. Let W = W\ ■ ■ ■ We- Let Si be the average of IQBl/e 2 independent random 
samples from Wi for i G [£]. Let S = S1S2 • • • Se- Then 



Pr ((l-e)E(W) < 5< (l + e)E(W) ) > 3/4. 
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It will be convenient to rewrite E (W 2 ) /E (W) 2 for W := Wp^i in terms of the partition 
function Z. We have 

E (W 2 ) - — !— V P -^W P 2(W')%) _ Z ( 2 ^ ~ ^) 

1 ' " z <« ' 

and hence 

E(W 2 ) Z{2(3' - (3)Z{(3) 

E(W) 2 Z(J3')* ■ [ > 

Equation ([5|) motivates the following definition. 

Definition 2.3. Let B > be a constant. Let Z be a partition function. Let /3q, ...,/% be 
a sequence of inverse temperatures such that = fio < Pi <•••</% = oo. The sequence 
is called a B-Chebyshev cooling schedule for Z if 

Z(2ft +1 -ft)Z(fl) 

^(A + i) 2 - ' lj 

for aUi = 0,...,£-l. 

The following bound on the number of samples is an immediate consequence of Theo- 
rem (21 



Corollary 2.4. Let Z be a partition function. Suppose that we are given a S-Chebyshev 
cooling schedule /3o, • • • , Pe for Z. Then, using 16B£ 2 /e 2 samples in total, we can compute 
S such that 

P((l - e)Z(oo) <S< (l + e)Z(oo)) > 3/4. 
3 Lower bound for non-adaptive schedules 

A cooling schedule will be called non-adaptive if it depends only on n and A = Z(0) 
and assumes Z(oo) > 1. Thus, such a schedule does not depend on the structure of the 
partition function. 

The advantage of non-adaptive cooling schedules is that they do not need to be figured 
out on the fly. An example of a non-adaptive Chebyshev cooling schedule that works for 
any partition function of degree n, where Z(0) = A, is 

1 2 nln.4 

0, ,oo. (7) 

n n n 

The idea behind the schedule © is that small changes in the inverse temperature result 
in small changes of the partition function. We will state this observation more precisely, 
since we will use it later. 

Lemma 3.1. Let e > and let (3 < (3' < (3 + e . Let Z be a partition function of degree n. 
Then 

Z((3)e^ n < Z{p) < Z(P). (8) 



5 



Proof : 

For i < n we have 

e -f3i e -en < e -(f3+e)i < < (g) 

Equation ([8]) now follows by applying ([9]) to each term of the Z's in (JHJ). ■ 

To see that ([7]) is a Chebyshev cooling schedule, note that, by Lemma [3. 11 the random 
variable Wp^> defined by (|2|) has values from the interval [1/e, 1] if < (3' — (3 < 1/n. 
This implies that for W := Wp^' the left-hand side of ([5]) is bounded by a constant if 
(3,(3' < co are neighbors in ([7]). It remains to show that (|5]) is bounded for (3 = hi A and 
(3' = oo. Note that that Z(oo) > 1 (since ao > 1) and 

n \ n 

Z(hiA) = a + ^~ iXnA < Z{oo) + - ^ a* < Z(oo) + 1. 

i=l i=l 



and hence for the right-hand side of ([5]) we obtain 

Z(lnA) 



Z(oo) 



< 2. (10) 



The length of the schedule ([7]) is 0(n In A). The following more efficient non-adaptive 
Chebyshev cooling schedule of length 0((ln^4) Inn) is given in PQ: 

12 k &7 &7 2 /c7* 

0,-, — , , oo, (11) 

n n n n n n 

where k = [In A~\ , 7 = 1 + and i = [(1 + In ^4) In n] . The schedule (jlip is based on 
the following observation (the statement of Lemma 13.21 slightly differs from pQ and hence 
we include a short proof). 

Lemma 3.2 (PQ). £ei Z be a partition function with Z(0) = A. Let (3 > be an inverse 
temperature and let (3' = (3(1 + t^-j)- T/ien 

lz(/3) < z(/3'). 

Proof : 

Let n be the degree of Z. First assume that a n e~@ n > 1. We have a n < Z(0) = A and 
hence (3 < ^— . This implies /?' < (3 + - and we can use Lemma 13.11 

Now assume a n e~^ n < 1. Let k € {0, . . . , n} be the smallest such that 



Y,aie~ pi <l. (12) 



i=k 

Note that k > 1, since oq > 1. From the minimality of /c we obtain 



Ae-fl*" 1 ) > £ Oie"^ > 1, 



=fe-i 
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and hence (3{k — 1) < In A. Hence for i < k — 1 we have ft'i < @i + 1. Now 

fc-i 



Z( / 9)<l + ^a i e-^, (13) 



i=0 

and 

fc-l fc-l „ fc-l 



z{0) > a ^' 1 > E a * e " /3i " 1 ^ ; E a * e_/3i ^ ( 14 ) 

i=0 i=0 i=0 

Combining (I13|) and (j!4H we obtain the result. ■ 

Next we show that the schedule (llip is the best possible up to a constant factor. We 
will see later that adaptive cooling schedules can be much shorter. 

Lemma 3.3. Let n E Z + , and A,B E R + . Let S = /3q, (3±, . . . , (3g be a non-adaptive 
B-Chebyshev cooling schedule which works for all partition functions of degree at most n 
with Z(0) = A, and Z(oo) > 1. Assume (5q = and (3g = oo. Then 

t>\n{n/e)( l ^ A ~J^ - l). (15) 



V ln(4B) 

In the proof of Lemma 13.31 we will need the following bound on the first step of the 
cooling schedule. 

Lemma 3.4. Assume A — 1 > AB. Then 

n 

Proof of Lemma 13. 4t 

Let < a < A — 1. Then S has to be a 1?-Chebyshev cooling schedule for 

Z(J3) = -^-(l + ae-e n 
1 + a V 

The equation ([6]) needs to be satisfied for Z, (3q = and Thus 

(l+ Q e-2fr")(l + q) 

(1 + ae-ft")^ ^ R (17) 
After substitution z = e - ' 31 '™, equation (ITT]) becomes equivalent to 



:iw)(i+«)_ 1+ ji^y £Bi (18) 



(1 + az) 2 Vi + a- 2 

Suppose that z < -rzi- Note that the left-hand side of (|18p is decreasing in z. Hence, (|18p 
is true for z = ■j^r- Let a = A — 1. For this choice of a and z, (fl"8l) yields (A — l)/4 < -B, 
a contradiction with yl > 41? + 1. Thus, z > ^-ry- 

Since z > 1/(A — 1), we have 1/z < A — 1 and, hence, we can choose a = 1/z. Plugging 
a = 1/z into (fTBj) we obtain 

< B , (19) 
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and, hence, z > 1/(45), which implies (j 16j) . ■ 

The Lemma 13.41 immediately gives a bound on the later steps in the schedule. If the 
current inverse temperature is then the coefficient of degree k is decimated by e~@ ik . 

Corollary 3.5. Let k G {1, . . . , n}. Assume (A - l)e~ ftfc > AB. Then 

Pi+l - Pi S 7 • 

Proof of Lemma 13. 3t 

Let S' = f3o, 0i, . . . , Pi be the shortest sequence such that Po = 0, Pi = oo and the Corol- 
lary [33] is satisfied for S'. 

We can greedily construct the shortest sequence S' as follows. If k € {1, . . . , n} is the 
largest such that (A — l)e~^ ik > 4B then we take 

ln(4B) 
Pi+l — Pi H 7 • 

(If (A - l)e- ft < 45 then we take $+1 = oo.) 

Let Xi be the number of indices for which Pi + \ — Pi = ln ^ B ' . Let j G {2, . . . , n} and 

i=j 

From P we take a step of length at least ln jt^ (since we already took all shorter steps) 
and hence 

(A - l)e" ft ' < 4B. (21) 

Plugging ([20]) into ([2"T]) we obtain 



71 



i 

Summing ([22|) for j = 2, . . . , n we obtain 



(K 4 B ))E, i >(E7)'»4i i£ ( ln i) ln ' 4 

which implies (]15p . ■ 

The number of samples needed in Theorem 12.21 (and Corollary I2.4p is linear in B and 
hence, in view of Lemma |3.3[ the optimal value of B is a constant. Our understanding 
of non- adaptive schedules is now complete up to a constant factor. In particular, the 
schedule (jlip and Lemma 13.31 imply that the optimal non-adaptive schedule has length 
Q((\nA) Inn). 

We would like to have a similar understanding of adaptive cooling schedules. A rea- 
sonable conjecture is that the optimal adaptive schedule has length 



9 V(lnyl)mn . (23) 



S 



We will present an adaptive schedule of length O ^Vhi A{\u n) In In A J . This comes reason- 
ably close to our guess in (|23|) (in fact, in our applications we are only off by poly logarithmic 
factors). 

We will have the following technical assumptions on A and n. 

lnn>l, lnln^>l, and A > Inn. (24) 

The first two assumptions are necessary since both Inn and lnhij4 figure in our bounds 
on the length of the schedule. The third assumption is justified for the following two 
reasons. First, in the applications we consider, A is usually exponential in n. Second, if A 
is too small then no cooling schedule is necessary - a direct application of the Monte Carlo 
method uses only A/e 2 samples (which, for A < Inn, is less than the number of samples 
needed by a cooling schedule of length given by (f23]) ) . 



4 Adaptive cooling schedules 

In this section, we prove the existence of short adaptive cooling schedules for general 
partition functions. We now formally state the result (to simplify the exposition we will 
choose B = e 2 , the construction works for any B). 

Theorem 4.1. Let Z be a partition function of degree n. Let A = Z(0). Assume that 
Z(po) > 1. There exists an e 2 -Chebyshev cooling schedule S for Z whose length is at most 



4(ln In A) y/QnA) Inn. 

It will be convenient to define /(/?) = \nZ(P). Some useful properties of / are sum- 
marized in the next lemma. We include a short proof in Section [6l 

Lemma 4.2. Let f{P) = InZ(P) where Z is a partition function of degree n. Then (a) f 
is decreasing, (b) f is increasing (i. e., f is convex) (c) /'(0) > — n. 

Recall that an e 2 -Chebyshev cooling schedule for Z is a sequence of inverse tempera- 
tures Pq, Pi, . . . , Pi such that Pq = 0, Pi = oo, and 



Z(2p i+1 - P i )Z{p i ) 



< e 2 . (25) 



z(p 



•i+i 



|2 



Since ()25[) is invariant under scaling we can, without loss of generality, assume Z{oo) = 
1 (or equivalently ao = !)• Since we assumed do > 1 the scaling will not increase Z(0). 

Let f{P) = hiZ(P), so that /(0) = ln^, and /(oo) = 0. The condition ([25]) is 
equivalent to 

/(2ft +1 -ft) + /(ft) _ /(ft+i) . 1 (26) 
If we substitute x = Pi and y = 2Pi + \ — Pi, the condition can be rewritten as 

f ( x + y\ > iM±M i. 



In words, / satisfies approximate concavity. The main idea of the proof is that we do 
not require this property to hold everywhere but only in a sparse subset of points which 
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will correspond to the cooling schedule. A similar viewpoint is that we will show that / 
can be approximated by a piecewise linear function g with few pieces, see Figure [T] for an 
illustration. We form the segments of g in the following inductive, greedy manner. Let 7$ 
denote the endpoint of the last segment. We then set ji+i as the maximum value such 
that the midpoint m, of the segment (7t,7i+i) satisfies (|2U|) (for = 7j,/?j+i = rrii). We 
now formally state the lemma on the approximation of / by a piecewise linear function. 

Lemma 4.3. Let f : [0, 7] 1— ► R be a decreasing, convex function. There exists a sequence 
70 = < 71 < • • • < 7j = 7 such that for all i G {0, . . . , j — 1}, 

j ( li + 7i+i ^ > f(li) +/(7»+i) _ x ^ 

and 



i<l + ^(/(0)-/(7))ln^. 

Proof : 

Let 70 := 0. Suppose that we already constructed the sequence up to 7$. Let 7^+1 be the 
largest number from the interval [7^,7] such that (|27l) is satisfied. Let mj = (7$ + 7j+i)/2, 
let Aj = (7 i+ i - 7i)/2, and Ki = f(ji) - /(7i+i)- 

If 7i+i = 7 then we are done constructing the sequence. Otherwise, by the maximality 
of 7t+i, we have 

f(m) = fill) + / (7t+l) " 1- 

Using the convexity of / and (|28p we obtain 

_ /(7l) >/M_ZM^, (29) 

and 

- f (7*1) < 5 = (30) 

Combining (|29l) and (f30l) we obtain 



/'( 7 i+i) = -/'( 7 i+i) < gi^ = 1 4 m x 
/'(7i) ~/'(7i) "^ + 2 Ki + 2' { ' 

From (130j) and the fact that / is decreasing we obtain K{ > 2. Hence we can estimate (1311) 
as follows 

Since / is decreasing, we have 

J-2 

^^</(0)-/( 7 ). (33) 

i=0 

Now we combine ([32]) for alH € {0, . . . ,j — 2}. 

f(0) 



E-<ln4^. (34) 
^ ^ " /'(7) 



8=0 
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Figure 1: The light curve is f(x) = InZ(x) for the partition function Z{x) = (1 + 
exp(-x)) 20 . The dark curve is a piecewise linear function g consisting of 3 pieces which 
approximates /. In particular, g > / and the midpoint of each piece is close to the average 
of the endpoints (specifically, ([25]) holds). 



Applying Cauchy-Schwarz inequality on (]33j) and (f34"l) we obtain 

(i-l) 2 <(/(0)-/( 7 ))ln f ' 



The construction immediately yields a natural cooling schedule. A schedule ending at 
Pk — 7ij can now be extended by f3k+i = m i where rrii is the midpoint of the segment 
(7i,7i+i). Moreover, we can then set as the midpoint of (mj,7i+i). We continue in 
this geometric manner for at most In In A steps, after which we can set the next inverse 
temperature in our schedule to 7«+i. Then we continue on the next segment. It then 
follows that the length £ of the cooling schedule satisfies £ < j In In A where j is the length 
of the sequence from Lemma 14.31 We now present the proof of the Theorem 14.11 
Proof of Theorem I4.lt 

Let 7 be such that f{^) = 1. We describe a sequence (3o = < j3\ <.../% = 7 satisfying 
([26]) . Note that since 7(7) = 1, we can take fy+i = 00 and the sequence will still satisfy 
(I26|) (and thus we get a complete e 2 -Chebyshev cooling schedule for Z). We have 

n 

Z(7) = exp(/(7))=£>e-^ = e, 

i=0 

and, hence, (using ao = 1) 



-Z\i) =^m ie - i7 > e- 1. 



i=0 
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Thus 



f {l ) = - In Z( 7 ) = = > £Z1. (35) 



By Lemma 14. 3| there exists a sequence of 70 = < 71 < • • • < jj = 7 of length 



j<l + J(lnA)ln^ (36) 

such that ([27|) is satisfied. 

Now we show how to add [In In A\ inverse temperatures between each pair 73 and ji+i 
to obtain our cooling schedule. For notational convenience we show this only for 70 = 
and 71. 

Note that (|27|) implies that (|26l) is satisfied for 0o = and /?i = 71/2. We now show 
that 

0, (l/2)7i, (3/4)71, (7/8)71, . . . , (1 - 2-^^)71,71 
is an e 2 -Chebyshev cooling schedule. Let 

Note that by we have g(0) = -1. We have 



, (l) _i(,(2I+£)_, (a!) ). 



Thus 

if x < 71 we have </(x) > 0, (37) 

and, hence, 

<?(*) > 9(0) = -1. 
Plugging in x = (1 — 2 _ *)7i we conclude 

/((l-2-'-%)> /((1 - 2 "y + /fa) -l. (38) 

From ()28p and (|38|) it follows that the sequence 

0, (l/2)7i, (3/4)7i, (7/8)71, . . . , (1 - 2^71,71 (39) 

satisfies (I26p . We will now show that we can truncate the sequence at t = [lnln^] and 
take the next step to 71. 
By the convexity of / 

/{(1 _ 2 -«- 1)7l) </(P^!M±iM, 

and hence 

/((I - 2— ) 7 i) - /(71) < /((1 " 2 l 7l) " /(7l) . (40) 
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The equation (flO|) states that the distance of /((l — 2 )7i) from /(71) halves in each 
step. Recall that /(71/2) — /(71) < /(0) < InA and, hence, for t := [In In ^4] we have 

/((l-2- i ) 7l )-/( 7 i)<l. (41) 

This completes the construction of the cooling schedule. The length of the schedule is 
< jt. Plugging in (I36p yields the theorem. 



The optimal Chebyshev cooling schedule can be obtained in a greedy manner. In 
particular, starting with (j$ = 0, and then from fy, choosing the maximum for which 
(|26p is satisfied. The reason why the greedy strategy works is that if we can step from (5 to 
f3' , then for any 7 G \J3, (5 1 ] we can step from 7 to /3' (i. e., having large inverse temperature 
can not hurt us). The last fact follows from the convexity of / (or alternatively from (|37p ). 

Corollary 4.4. Let Z be a partition function of degree n. Let A = Z(0). Assume that 
Z(po) > 1. Suppose that (5q < ■ ■ ■ < is a cooling schedule for Z. Then the number of 
indices i for which 

Z(2f3 i+1 -p i )Z(p i ) >g2 



is at most 4(ln In A) y (In A) Inn. 

We now formally prove the greedy property of Chebyshev cooling schedules. Note that 
we can make a step from x to y if g(x,y) < 1, where 

f(x) + f(2y-x) 

9{x,y) = 2 f( y >- ( 43 ) 

Lemma 4.5. Let Z be a partition function. Let f = In Z(/3) and let g be given by (|43p - 
The function g(x,y) is decreasing in x for x < y. The function g(x,y) is increasing in y 
for x < y. 

Proof : 

By Lemma 14.21 we have that /' is an increasing function. We have 2y — x > x and hence 

dg(x,y) _ f'(x)-f'(2y-x) 
dx 2 

Analogously 

dg{x,y) , ,. . 

— « =JK 1 y-x)-f y > 0. 

oy 



Proof of Corollary 14. 4\ 

Let ko < ki < ■ ■ ■ < k m be the indices for which (fl2j) is satisfied. Let ao = < ai < 
• • • < a.g = 00 be the optimal e 2 -Chebyshev cooling schedule. We are going to show, using 
induction on j, that 

aj<P kj . (44) 



Clearly (jUJ) is true for j = 0. 
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Now assume (jM]) is true for some j. We have g(aj, a-j+i) < 1, g(Pi,Pi+i) > 1, and 
Q!j < Pi- From Lemma 14.51 it follows that ay+i < Pi+i and hence 

a j+ i < /3 i+ i < /3 fcj+1 , 

completing the induction step. 

Equation (|44p implies m < £ < 4(ln In ^4) (In ^4) In n. ■ 

4.1 Extensions 

The key property of Z(JS) used in the proof of existence of a fast cooling schedule is the 
fact that it is logconvex (i.e., its logarithm, f(P) = lnZ(/3), is convex). The proof above 
can be appropriately modified for other function classes with this property. We highlight 
this for a class of continuous functions. 

Lemma 4.6. Let g : M. n — > IR be a continuous, integrable, nonnegative function. Define 

Z(J3)= [ g(xfdx 



for (3 > 0. Then Z((3) is logconvex. 

The proof is identical to that of Lemma 14.21 part(b). 

4.2 Lower bound for adaptive cooling 

Lemma 4.7. Let n > 1. Consider the following partition function of degree n: 

Z((3) = (l + e-P)*. 



Any B-Chebyshev cooling schedule for Z(f3) has length at least -^/n/(201nB). 
Proof : 

Let f((3) = InZ(P) = nh^l + e - ' 3 ). If the current inverse temperature is =: (3, the next 
inverse temperature (5i + \ =: j3 + x has to satisfy 

f(0) + f(0 + 2x) - 2/03 + x)< In 5. 

Later we will show that for any (3 G [0, 1] and x G [0, 1] we have 

f((3) + f{(3 + 2x)-2f{(i + x)>—x 2 . (45) 



20 In 5 



From (I45j) it follows that for f3 < 1 the inverse temperature increases by at most 

x < 



n 



and, hence, the length of the schedule is at least yj 'n / '(20 In B) 
It remains to show (1451). Let 



, m f(P) + f(P + 2x)-2f(P + x) 
9(x,P) := 
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We have 

q e~@~ x e -/3-2x 



dx* K u 1 1 + e-/ 3 -^ 1 + e-^- 2 ^' 

We will show 



e -/9-x e -/3-2x 



> x/20, (46) 



1 + e-P~ x 1 + e-/ 3 " 221 
which will imply (|45p (by integration over x). 

Let C := e' 13 and y := 1 - e~ x . Note that C G [1/e, 1], y G [0, 1 - 1/e], and x = 
— ln(l — y). For y G [0, 1 — 1/e] we have — ln(l — y) < y + y 2 and hence it is enough to 
show 

C(l-y) C(l-y) 2 ^ 1 , 2 , 

>™(y + y 2 )- (47) 



l + C(l-y) l + C(l-y) 2 ~ 20 
Multiplying both sides by the numerators we obtain that (|47p is equivalent to 

P(y, C) := y(y + l)(y - ifC 2 - (y 4 - 2y 3 + 19y 2 - 18y)C - (y 2 + y) > 0. 

The polynomial y(y + l)(y — l) 3 is negative for our range of y and hence for any fixed y, 
the minimum of P(y, C) over C G [1/3, 1] occurs either at C = 1 or at C = 1/3 (we only 
need to show positivity of P(y, C) for C G [1/e, 1], but for numerical convenience we show 
it for a larger interval). We have 

p(y, 1) = y 5 - 3/ + 2y 3 - 18y 2 + 16y, (48) 

and 

9p{y, 1/3) = y 5 - by 4 + 6y 3 - My 2 + 44y. (49) 

Both (|48p and (|49p are non-negative for our range of y (as is readily seen by the method 
of Sturm sequences). This finishes the proof of (|46p . which in turn implies (145p . ■ 



5 An adaptive cooling algorithm 

The main theorem of the previous section proves the existence of a short adaptive cooling 
schedule, whereas in Section[3]we proved any non-adaptive cooling schedule is much longer. 
In this section, we present an adaptive algorithm to find a short cooling schedule. We state 
the main result before describing the details of the algorithm. The algorithm has access 
to a sampling oracle, which on input (3 produces a random sample from the distribution 
fi/3, defined by ([I]) (or a distribution sufficiently close to up). 

Theorem 5.1. Let Z be a partition function. Assume that we have access to an (ap- 
proximate) sampling oracle from fip for any inverse temperature (3. Let 5' > 0. With 
probability at least 1 — 5' , algorithm Print-Cooling-Schedule outputs a B-Chebyshev 
cooling schedule for Z (with B = 3 • 10 6 /, where the length of the schedule is at most 

£ < 38Vh^A(ln n) In In A. (50) 

The algorithm uses at most 

Q < 10 7 (ln^)((lnn) + In In A) 5 In (51) 

samples from the fip-oracles. The samples output by the oracles have to be from a distri- 
bution which is within variation distance < 5'/(2Q) from [ip. 
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In Section [7] we extend the algorithm to the setting of warm-start sampling oracles 
(see Theorem (j7.5j) ). 



5.1 High-level Algorithm Description 

We begin by presenting the high-level idea of our algorithm. Ideally we would like to find 
a sequence Pq = < P\ < ■ ■ ■ < Pe = oo such that, for some constants 1 < c\ < C2, for all 
i, the random variable W := Wp u p i+1 satisfies 

E(W 2 ) , 

ci < — i f < c 2 . 52 

" E{W) 2 ~ 

The upper bound in ([52]) is necessary so that Chebyshev's inequality guarantees that few 
samples of W are required to obtain a close estimate of the ratio Z '(Pi) / 'Z '(A+i) • On the 
other side, the lower bound would imply that the length of the cooling schedule is close 
to optimal. We will guarantee the upper bound for every pair of inverse temperatures, 
but we will only obtain the lower bound for a sizable fraction of the pairs. Then, using 
Corollary 14.41 we will argue that the schedule is short. 

During the course of the algorithm we will try to find the next inverse tempera- 
ture so that ([52]) is satisfied. For this we will need to estimate u = it(/3j, Pi+i) := 
E (W 2 )/E(W) 2 . We already have an expression for u, given by equation (J5J): 

= E(W 2 ) = Z(2(3 i+1 - fl)Z(fl) = Z(2&+i-&) Z((3i) 
U E(W) 2 Z([3 l+l f Z((3 i+1 ) z(p i+1 y [ > 

Hence, to estimate u it suffices to estimate the ratios Z(2/3j+i— and Z (Pi) / Z (Pi + \) . 
Recall that the goal of estimating u was to show that W is an efficient estimator of 
Z (fli) I Z (Pi+x) . Now it seems that to estimate u we already need a good estimator for W. 
An important component of our algorithm, which allows us to escape from this circular 
loop, is a rough estimator for u which bypasses W . 

Recall, the Hamiltonian H takes values in {0, 1, . . . , n}. For the purposes of estimating 
u it will suffice to know the Hamiltonian within some relative accuracy. Thus, we partition 
{0, 1, . . . , n} into (discrete) intervals of roughly equivalent values of the Hamiltonian. Since 
we need relative accuracy the size of the interval is smaller for smaller values of the 
Hamiltonian (specifically, value i is an interval of size about i / y/\nA) . We let P denote 
the set of intervals. We will define the intervals so that the number of intervals \P\ is at 
most 0(Vln A Inn). 

The rough estimator for u needs an interval I = [b, c] C {1, . . . , n} which contributes a 
significant portion to Z(P) for all P £ [Pi, ip%+\ — Pi]. We say such an / is heavy for that 
interval of inverse temperatures. Thus, if we generate a random sample from ^ we have 
a significant probability that the sample is in the interval /. The key observation is that 
if an interval / is heavy for inverse temperatures Pi and P2, then by generating samples 
from Ufa and fMg 2 , and looking at the proportion of samples whose Hamiltonian falls into 
interval /, we can roughly estimate Z(P<l)IZ(P\). 

Thus, if an interval / is heavy for an interval of inverse temperatures B = [Pi, P*], then 
we can find a Pi + ± G B' = [Pi, (Pi + P*)/2] satisfying ([52]) (making an optimal move in 
some sense) or determine there is no such € B'. 
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In the later case we construct a sequence of inverse temperatures that goes from /3j to 
P* where the upper bound in (|52p holds for this sequence. We will show that O (In In A) 
intermediate inverse temperatures are sufficient to go from (3{ to (3* (the construction is 
analogous to the sequence (I39p in the proof of Theorem 14. ip . Once we reach (3* we will 
be done with this interval / and will not need to consider it again. 

An important fact is that for an interval /, the set of /3's where / is heavy is itself an 
interval. Hence, each interval causes a non-optimal step at most once (causing a sequence 
of 0(ln\nA) intermediate inverse temperatures). Thus, our algorithm will find a cooling 
schedule whose length is at most 

O (^(lnlnA) v /(ln^)lnn + \/h7I(lnn)lnln^ , (54) 

where the first term comes from Theorem 14.11 and the second term comes from the upper 
bound on |P| = 0(v4n _ A(lnn) and the fact that the non-optimal steps cause the algorithm 
to output a sequence of 0(lnln^4) intermediate inverse temperatures. 

To simplify the high-level exposition of the algorithm we glossed over a technical aspect 
of the rough estimator which sometimes does not allow a move long enough to finish off 
the interval /. Such a move will be long relative to the reciprocal of the width of the 
/ and will be referred to as "long" step. ("Long" steps will be analyzed by a separate 
argument, and their number will be smaller than (|54p .) Thus, in the detailed description 
of the algorithm we will have three kinds of steps: "optimal" steps, "interval" steps, and 
"long" steps. 

Combining Theorem 15.11 with Corollary 12.41 we obtain. 

Corollary 5.2. Let Z be a partition function. Let e > be the desired precision. Suppose 
that we are given access to oracles which sample from the distribution within variation 
distance 

^ 

10 8 (ln,4)((lnn) + In In A) 5 

from [ip for any inverse temperature (3. 

Using i^(ln A) ((Inn) + In In A) 5 samples in total, we can obtain a random variable 
S such that 

P((l - e)Z(oo) < S < (1 + e)Z{oo)) > 3/4. 
5.2 Detailed Algorithm Description 

Here we present a detailed description of the algorithm. We also present pseudocode for 
the algorithm in Section [10] of the Appendix. 

First, we construct a partition P of {0, . . . , n} into 0(vln _ Alnn) disjoint intervals. We 
construct P inductively, starting with interval [0, 0]. Suppose that {0, . . . , b — 1} is already 
partitioned. Let 

w:=[b/VT^A\. (55) 

Add the interval [b, b + w] to P and continue inductively on {b + w + 1, . . . , n}. Note, the 
initial \/ln A intervals are of size 1 (i.e., contain one natural number), and have width 0. 
Later (in Section 15 .3|) we will show the following explicit upper bound on the number of 
intervals in P. 
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Lemma 5.3. \P\ < 4v / ln _ Alnn. 

In each stage of the algorithm we want an interval which is heavy in the following 
precise sense. 

Definition 5.4. Let Z be a partition function. Let (3 £ M + be an inverse temperature. 
Let / = [b, c] C {0, . . . , n} be an interval. For h £ (0, 1), we say that / is h-heavy for /3, if 
for X chosen from we have 

Pr(H(X) el)>h. 

The following property will be crucial for our algorithm: the set of inverse temperatures 
for which an interval / is heavy is itself an interval (in the proof is deferred to 

Section [6l 

Lemma 5.5. Let Z be a partition function. Let L = [b,c] C {0, ...,n} be an interval. 
Let h € (0, 1]. The set of inverse temperatures for which L is h-heavy forms an interval 
(possibly empty). 



Let 



h = m- (56) 



In our algorithm we will use an interval which is /i-heavy. Given access to a sampler for 
X ~ ug one can approximately check whether an interval is /i-heavy for (3. More precisely, 
we can distinguish the case when / is /i-heavy versus when / is not 4/i-heavy. We formalize 
this observation in Lemma 15.71 First we need the following definition. 

Definition 5.6. Let Z be a partition function. Let I = [b, c] C {0, . . . , n} be an interval. 
Let 5 € (0, 1] and let j3 be an inverse temperature. Let X ~ //« and let Y be the indicator 
function for the event H(X) G /. Let s = |~(8//i) In ^] . Let U be the average of s 
independent samples from Y. Let 

IS-HEAVY(/,/3) = { false . fu - 2h 

Lemma 5.7. If L is not h-heavy at inverse temperature (3, then 

Pr (Is-HEAVY(1, 13) = true) < 5. (57) 
If I is Ah-heavy at inverse temperature (3, then 

Pr (Is-Heavy(7, (3) = false) < 5. (58) 
The above lemma is proved in Section [6l 

If we take s = \(8/h) In y] samples from /j,p, and take the interval which received the 
most samples, then we are likely to get an /i-heavy interval. Note that by our choice 
of h there exists a 8/i-heavy interval J. By Lemma 15.71 it is very likely that J receives 
more than 2hs samples and that all intervals which are not /i-heavy receive less than 2hs 
samples. Thus, the interval with the most samples will likely be /i-heavy. 
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Corollary 5.8. Given an inverse temperature 0, using s = \(8/h) In j] samples from fip 
we can find an h-heavy interval. The failure probability of the procedure is at most S\P\. 

We will need a more general version of Corollary 15.81 in which the set of intervals that 
can be chosen is restricted. The forbidden intervals will not be 8/i-heavy and, hence, there 
will exist an allowed interval which is 8/i-heavy. Using the same reasoning as we used for 
Corollary 15.81 we obtain the following procedure, which we call Find-Heavy. 

Corollary 5.9. Let be an inverse temperature. Let Bad be a set of intervals such 
than no interval in Bad is 8h-heavy at 0. Given an inverse temperature 0, using s = 
[~(8//j)ln^] samples from /xg we can find an h-heavy interval which is not in Bad. The 
failure probability of the procedure Find-Heavy is at most 5\P\. 

We use the following idea: if a narrow interval is heavy for two nearby inverse temper- 
atures 01,02 then the interval can be used to estimate the ratio of Z{0i) and Z{02). 

Lemma 5.10. Let Z be a partition function. Let L = [b,c] C {0, . . . ,n} be an interval. 
Let 5 G (0,1]. Suppose that I is h-heavy for inverse temperatures 0i,02 £ Assume 
that 

\0i ~02\ -(c-b) < 1. (59) 

For k = 1,2 we define the following. Let ~ fip k and let be the indicator function 
for the event H(Xk) £ /. Let s = |~(8//i) In 4~| . Let Uk be the average of s independent 
samples from . Let 

Est(J, 0i, 02) := ^exp(b(0i-0 2 )). (60) 

U2 

With probability at least 1 — 45 we have 

Zm <EST(/, A ,«)<^M. (61) 



4eZ(A) - * """"" - Z(J3 

The above lemma is proved in Section [6J 

Remark 5.11. (on imperfect sampling) In the description of our algorithms we will 
assume that we can perfectly sample from the distributions /ig. Of course, in applications 
we can only sample from distributions which are at a small variation distance 5 from [i^. 

Our algorithms will still work, as the following, standard, coupling trick shows. We 
can couple the biased distributions and the perfect distributions so that they differ with 
probability S. If we take t samples total then, by union bound, with probability at least 
1 — 5t the algorithm with biased samplers will have the same output as the algorithm with 
perfect samplers. 

Remark 5.12. (on randomization) The randomness in our algorithm will come from 
the procedures Est and Is-Heavy. The failure probability parameter 8 will be chosen 
very small so that during the execution of the algorithm no failures of Est and Is-Heavy 
occur with high probability (formally, we use the union bound). Thus in the proof of 
correctness we will ignore the possibility of failure of these procedures and deal with the 
errors separately. 
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Remark 5.13. (on binary search) In our algorithm we will have to (approximately) 
find the right-most point in an interval [a, b] which satisfies a given predicate IT. The 
predicate will be such that 11(a) is true. We use the binary search on an interval [a, b] in 
the following manner. If 11(6) is true then we return b. Otherwise we set A = a, p = b and 
perform binary search until p — A < e, where e is the precision. Note that in the end we 
will have n(yo) is false and 11(A) is true. We return A. 

We now give a detailed description of our algorithm for constructing the cooling sched- 
ule. Let 5' be the desired final error probability of our algorithm. We will call the proce- 
dures Est, Is-Heavy, and Find-Heavy with the same value of 5, which will be chosen 
as follows: 

8' 

6 = 1600(lnn) 2 (ln^) 2 ' ^ 

Let 

(8/ft)lni 

We will keep a set Bad of banned intervals which is initially empty. 

Note it suffices to have the penultimate (3 in the sequence be j3i-i = In A, since we can 
then set (3i = oo (see equation (jlOp ). The algorithm for constructing the sequence works 
inductively. Thus, consider some starting (3q. 

1. We first find an interval / that is h- heavy at (3$ and is not banned. By generating s 
samples from the distribution ug and taking the most frequently seen interval, we 
will successfully find an h- heavy interval with high probability (see Corollary 15.91 for 
the formal statement). 

2. Let w denote the width of I, i.e., w = c — b where / = [b, c]. Our rough estimator 
(given by Lemma l5.10h only applies for j3\ < (3q + 1/w (by convention 1/0 = oo). 
Moreover, since we only need to reach a final inverse temperature of In A, let 

L = min{/? + l/w,\nA}. 

Now we concentrate on constructing a cooling schedule within {(3q , L] . 

3. Intuitively, we do binary search in the interval [/3o>-^] to find the maximum (3* such 
that (3* is h-hesvy. We can use binary search because, by Lemma 15. 5( the set of 
inverse temperatures for which an interval is heavy is an interval in M + . (More pre- 
cisely, we do binary search in the interval [(3q, L] with predicate Is-Heavy(J, (3) and 
precision e = l/(2n). We use the binary search procedure described in Remark l5.131 ) 

4. We now check if there is an "optimal" move within the interval 

b' = 0% ,(A) +/n/2]. 

We want to find the maximum j3 G B' satisfying (|52p for u((3q,{3), or determine 
no such (3 exists. Let c\ = e 2 and C2 = 3 • 10 6 for ([521) . To find such a 0, we do 
binary search and apply Lemma 15.101 to estimate the ratios Z{2[3 — (3q)/Z{(3) and 
Z(/3 )/Z(/3). Note for j3 € B' we have 2(3 - /3 G [/3 O ,0*], hence, the interval / is 
n-heavy at inverse temperatures (3q,(3 and 2(3 — (3$ and Lemma f5. 101 appliesl^l 



1 More precisely, we perform binary search with predicate Est(7, (3o, /3)-Est(J, 2/3 — f3o,f3) < 2000. 
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(a) If such a E B' exists, then we set (3 as the next inverse temperature and 
we repeat the algorithm starting from (3. We refer to these steps as "optimal" 
moves. 

(b) If no such (3 exists, then we can reach the end of the interval B as follows. There 
are two cases, either the interval was too wide for the application of Lemma 
15.101 or the interval / stops being heavy too soon. More precisely, either: 

i. If (3* = L, then we set (/3o + /3*)/2 as the next inverse temperature. More- 
over, if j3* < In ^4 we continue the algorithm starting from /?*; whereas if 
(3* = \n.A we are done. We refer to these steps as "long" moves. 

ii. Otherwise, we add the following inverse temperatures to our schedule: 

A) + + \l,Po + -7,... , A) + (1 - 2~*)7,A) + 7, 

where 7 = (3* — (3q and t = [In In A~\ . We add the interval / to the set 
of banned intervals Bad and continue the algorithm starting from (3* . We 
refer to these steps as "interval" moves since the interval / will not be used 
by the algorithm again. 

Lemma 5.14 (Step 3). Assume that no failures occurred. After step 3 of the algorithm, 
the interval I is h-heavy for (3* . Moreover, if (3* 7^ L, then the interval I is not 8h-heavy 
for 13*. 

Lemma 5.15 (Step 4). Assume that no failures occurred. Then after step 4 °f ^he algo- 
rithm 

mmpM ss.tf. (63 ) 

Moreover, if j3 < (/?* + /%)/2, then 

Z(p )Z(2(3-(3 ) 

Z{(3Y ~ ' 1 j 

5.3 Bounding the length of the cooling schedule 

We first estimate \P\, the number of intervals in P. It is used to bound the number of 

interval moves. 

Proof of Lemma 15. 3t 

Let i E {0, . . . , n}. Suppose that the interval / containing i starts at b. Thus, by (|55p . the 
width of I is \b/ \/ln A\ . Since i is in / we have 

1 + v/ETa 



i < b + [b/VhTA} < 6(1 + l/V^A) = b J_ . (65) 

VlnA 

We can lower bound the width of the interval containing i as follows (in the second 
inequality we use (|65|) ): 

b b i 

-7= >^="1> T=r-1- 

VhiA J VlmA 1 + VlnA 
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If for each i G {0, . . . , n} we take the width w of the interval containing i and add up 
the l/(w + 1), we obtain the number of intervals. Thus, the total number of intervals is 
bounded as follows 

\P\ < 1 + V 1 + ^""^ < i + n + i nn )(i + y/faA) < 4v / hTIln?i. (66) 



We now bound the number of long moves. 

Lemma 5.16. Assume that no failures occurred during the algorithm. The number of 
"long" steps is bounded by 26v / ln _ Alnn. 

Proof : 

At most one "long" move can have L = In A (because the algorithm stops at the inverse 
temperature In A). Thus, we only need to estimate the number of "long" moves for which 
L = p + l/w. 

Let Xk be the total number of "long" moves for which the width of the interval / was 
k. Let k' be the largest k such that Xk is non-zero. Let yt = Xk for k < k' and let 
Vk' = x k > - 1. 

Let k £ {1, . . . , k'}. Let t = (y^ + Vk'-i + ' ' ' + Dk)- After t "long" moves the inverse 
temperature satisfies 

k' 

m 

2i 



A)>E§ ( 67 ) 

i=k 

(Po would be equal to the right-hand side of (|67p if we took the t shortest "long" moves). 
Note that xy + • • • + > t, and, hence, we still have to make a long step with the width of 
the /i-heavy interval / at least k. This long step has to happen at an inverse temperature 
/?0, or higher. 

We will need the following property - for any interval [6, c] G P of width w = c — b and 
any i G [b, c] we have 

i > b > wVh^A. (68) 

This follows directly from (|55p (since the chose the width w to be \b/y/\nA\). 
From (|68p we have 

J2 a iZ~ Poi < Ae~ l3okV ^ J . (69) 
Assume the left-hand side of (f69j) is < h. Then / is not /i-heavy for any p > Po, since for 

in the last inequality we used ^(p) > «o ^ 1> which is true for any p. Thus, in the binary 
search in step 3) of the algorithm the Is-Heavy will always report false and (5* will be 
about Po + l/2n (more precisely P* < Po + l/2n). Hence P* < L, which implies that a 
"long" move with / of width > k is impossible, a contradiction. 
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Thus, the left-hand side of (|69|) is > h, and hence 
By combining (|70p and (|67p we obtain 



E ^<2 A <|.Hdf. (71) 
^ « k VlnA 



Adding (fTTj) for A; = 1, . . . , k' we obtain 

f K < 2(1 + lnn)^B < 4(lnn) (v^A + ^B) . 

By Lemma 15.31 and the definition of h (equation (|56p ) we have 

1/h < 32v / ln _ Ilnn. 
and hence (using our assumptions (I24p ) we obtain 

ln(l/7i) < 5 In A. 
The total number of long moves is thus bounded 

y 

2 + ^2vi< 2 + 24(lnn)v / lmI < 26v / hTIlnn. 

i=l 



We now prove Theorem 15.11 
Proof of Theorem I5.lt 

The number of "optimal" moves is bounded by 



(72) 



A y / (\nA)lnn\nlnA, (73) 

see Corollary 14.41 Each "interval" move causes at most s = 2 In In A inverse temperatures 
to be output. Hence, by Lemma 15.31 the total number of inverse temperatures output by 
"interval moves" is bounded by 



8\/m~A(lnn)lnlnA (74) 
Finally, the number of "long" moves is bounded by Lemma 15.161 and it is at most 



26^1^ Inn. (75) 
The total number of moves is bounded by the sum of (|73p . (174p . (l75p . which is bounded by 



38\/ln A(ln n) In In A. This proves (I50p . 

Let T = 38\/hi A (In n) In In A. The length of the output schedule is bounded by T and 
hence every step of the algorithm is executed at most T times. The binary search on step 
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3 starts with an interval of width at most lnyl and works with precision l/(4n). The total 
number of calls to Est is thus bounded by 



2Tlog 2 (8nlnA) < 8T(lnn + In In A). 



(76) 



The total number of calls to Is-Heavy is certainly bounded by (|76p . since the starting 
interval has width at most In A and works with precision only l/(2n). Finally, the number 
of calls to Find-Heavy is at most T. 

Assuming perfect samples from the fig, our algorithm can only fail inside Est, Is- 
Heavy, and Find-Heavy. For each call this failure is bounded by 48 for the first two, 
and \P\S for Find- Heavy (see Lemma [5 .71 Lemma [5.101 and Corollary 15. 9p . By the union 
bound the total failure probability is bounded by 



where in the second to last inequality we used lnlnvl < lnvl, and in the last inequality we 
used the definition of S given by (|62j) . 

Of course requiring perfect samples is a too stringent requirement. Imperfect sam- 
ples introduce one more source of error in our algorithm. As discussed in Remark 15.111 
this is dealt with by a coupling argument. By our choice of the variation distance the 
imperfectness of samples manifests with probability at most 5' /2. 

The number of calls (per invocation) to the fig oracles made by any of the three 
procedures is 



16r(lnn + lnlnA)45 + T(4v / rn _ Alnn) ( 5 < fflTVh^A(lnn)5 

< 2400 (In A) 2 (Inn) 2 5 < 5' /2. 



s = 



(8/7i)ln- < 512v4n~Z(lnn) ( In 2400 + (2 In Inn) + (2 In In A) + In 




(77) 



Hence the total number of calls to the fig oracles is bounded by 



Q < 20T((mn) +lnlnA)s < 10 7 (ln^)((lnn) + lnln^) 5 In -. 



6 Leftover Proofs 



Proof of Lemma 14.2b 

We have 



\j=o / 

Let Y be the random variable defined by Y = H{X) where X ~ fig. We have 




f'(P) = ^M = E(-Y) = -E(Y). 




24 



Since the hamiltonian H had values in the range [0, n] we obtain parts (a) and (c) of the 
lemma. 

Similarly 

„ z»(p)z((3)-z>(^ _z»(p) (z'W)V_ E(Y2) e(y?>0 
nfi) - zW "Wlw _ ' ' >0 ' 

by Jensen's inequality, proving part (b) of the lemma. ■ 
Proof of Lemma 15. 7\ 

Assume that / is 4/i-heavy. Thus, the expected number of samples that fall inside / is at 
least 4/is. By the Chernoff bound (see, e.g., [H Corollary 2.3]) it is very likely that the 
number of samples X that fall inside / greater than 2hs. Formally, 

Pr (X < 2hs) < e~ sh / 8 < 5. (78) 

Now assume that / is not /i-heavy. Thus, the expected number of samples that fall inside 
/ is at most hs. By the Chernoff bound, 

Pr (X > 2hs) < e~ sh/8 < 5. (79) 



Proof of Lemma 15. 5t 

The interval / is dense for (3 = — In x if 

n 

> hJ2 a i xi ~^2a i x i =: g(x). (80) 

i=0 iel 

Note that g(x) is a polynomial with at most 2 coefficient sign changes (i. e., looking at the 
coefficients sorted by the degree, the sign changes at most twice). Hence, by the Descartes' 
rule of signs it has at most 2 positive roots. Without loss of generality, we can assume 
that n I (otherwise we can "flip" the problem by i i— ► n — i). Thus, g{x) is positive at 
x = oo and hence the set of x £ M + on which g{x) is negative is an interval. Using the 
monotonicity of In we obtain the result. ■ 

Proof of Lemma 15. 10b 

Note that E (Y k ) = Pr (X k £ I) > h. By the Chernoff bound for k = 1, 2 we have 



Pr < U k < 2E (Y k )\ > 1 - 2e^/ 8 , 



and hence 



Pr(7-|7^<^<4-|^)>l-4e-WB. (81) 



4 E (Y 2 ) ~ U 2 ~ E (F 5 



2, 



We have 



E(Yt) _ Z(f3 2 ) zZiei^'^ _ Z((3 2 ) ^ m _ Pl) zZiei a ie -^+^-™- b ) 
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and therefore 

z(A)-E(y 2 ) - e (82) 

Now combining (|82p . (|8ip . and using assumption (|59p . the lemma follows. ■ 
Proof of Lemma I5.14t 

Is-Heavy(/3*, I) reported I as /i-heavy for 0* . Assume that (3* / L. Then the bi- 
nary search ended with interval [X,p] where A = (3*,p < (5* + l/2n. We have that 
Is-Heavy(/3*, /) reported 7 as not 4/i-heavy for /3*. Weight of an interval decreases by a 
factor of at most y/e between A and p and, hence, I is not 8/i-heavy for a + (3. ■ 

Proof of Lemma I5.15t 

We use Est to refer to the procedure defined in Lemma l5.101 Since there were no failures, 
none of the calls to Est failed. 
The predicate 

Est(I, fa, z)Est(J, 2x - (5 ,x) < 2000 (83) 
was true for x = (3. From (foil) , we obtain 

Z(/3 ) Z(2/3 - fa) 
Z(/3) Z(J3) 

and hence 



< (4e) 2 EsT(/, A), /3)Est(J, 2/3 - /%, /?), 



|M^_M (4e)22 „ 00<3 . 106 

Assume that /3 < (/?* + /3o) /2. The binary search ended with an interval [A, p] where A = (3 
and p < (3 + 1/ (4ra) . Using ([6Tj) we obtain 

The predicate (j83[) was false on p and hence 

Z(/3 ) Z(2p - go) 2000 
Z(p) Z(p) "(4e)2- 

By Observation Owe have Z(2/3 - /%) > Z(2p - /3b) and Z(/3) < Z( / o)e 1 / 4 . 

Z(/3 ) Z(2/3 - /3 ) _ 1/2 Z(/3 )Z(2p-/3o) 2 
Z(J3) Z(/3) " e Z(p) Z(p) " e - 



7 Reversible Cooling Schedules for Warm Starts 

In this section we show how to adapt the schedule generating algorithm to the setting of 
"warm starts", which often leads to faster sampling algorithms (see, e.g., |10|. Illj). 

This method reuses randomness to improve the overall running time. The downside 
is a slight dependence between random variables occurring in our algorithm. We will use 
the following notion of dependence. 
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Definition 7.1. Random variables X,Y are K-independent if for every (measurable) A, B 
we have 

\P(X £ A,Y £ B) - P(X £ A)P(Y £ B)\<k. 



We will need the following variant of Theorem 
slight dependence between its random variables. 



implicit in [TT], which allows for 



Theorem 7.2. Let W = (W\, . . . , Wg) be a vector random variable. Let I be a nonnegative 
integer, K > 512£/e 2 and k < 2~ 20 e 2 / (K 5 1) . Assume that 



Wi is K-independent from (Wi, 
E(W 2 )/E(Wi) 2 < B 



,Wi-i), and 



for i £ [£]. Let W = W\ . . . Wg. Let S = (Si, . . . , Sg) be the average of K samples from 
W. LetS = S 1 S 2 ---S e . Then 

Pr ((1 - e)E (w\ < S < (1 + e)E (w)} > 3/4. 

We will use the following two notions of distance between probability distribution. For 
a pair of distributions v and iona finite space their total variation distance is defined 
as: 



\v — 7t||tv 



In the applications section we will also need to consider L 2 distance defined by: 

2 



1 



2 
2," 



Var 7r (^/7r) = tt(x) 



sen 



v(x) 
7r(x) 



1 



For a Markov chain (Xt) with unique stationary distribution tt we will use the following 
result on the distance from stationarity after t steps, starting from a "warm start" vq. Let 
vt denote the distribution of Xt,t > 0. Let T2 denote the inverse spectral gap, commonly 
known as the relaxation time, of the Markov chain. We will then use the following well- 
known fact, (see, e.g., [6], Theorem 5.6). 



Lemma 7.3. 



Wt - ttIItv < exp(-t/2r 2 ) 



7T 



1 



2.7T 



Let 0o = < • • • < (3g = oo be a cooling schedule and let fii = fj,^ (for i = 0, . . . ,£). 
In our applications we will use the distribution from the previous round (i) to serve as a 
warm start the current round (i + 1). For this we need that the "warm start" distribution 
fj,i is close to the distribution /Uj+i, which is the stationary distribution for the current 
chain. We will use the L2-notion of warm start, i. e., we will require inverse temperatures 
such that 

Mi 



1 



Mi+l 



(84) 



2, Mi 
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is bounded. Then m is a good "warm start" for fMi+\ and we can use Lemma [7^31 to upper 
bound the mixing time, obtaining a substantial improvement over the usual "cold start" 
bound (the saving comes from the fact that ti is often substantially smaller than the 
pessimistic "cold start" mixing time r m i x ). 

A short calculation yields that the L 2 distance between distributions /x,; and fj,j can 
be expressed as a squared coefficient of variation of the variables arising in our algorithm. 
More precisely 

Var/ * ~ MwiA " w W • (85) 

\ Pj Pi ) 

Note that for j = i — 1 the right-hand side of (I85p becomes the right-hand side of 
the definition of U-Chebyshev cooling schedule (equation ©). Thus for a S-Chebyshev 
cooling schedule 

Vax w (jh+i/ih) < B - 1. (86) 



The left-hand side of (I86h is the left-hand side of (|84j) with the roles of and reversed. 
Thus, the condition that (|84"|) be bounded is equivalent to saying that the schedule 

Pi = oo> Pi-i > ■ ■ ■ > 0i > = 0o, 

(i.e., the schedule in reverse) is a i?-Chebyshev schedule for some constant B. This 
motivates the following definition. 

Definition 7.4. Let B > be a constant. Let Z be a partition function. Let /3q, ...,/% be 
a sequence of inverse temperatures such that = 0o < P\ < ■ ■ ■ < fig = oo. The sequence 
is called a reversible B-Chebyshev cooling schedule for Z if 



Z(2ft +1 - A)Z(A) 



'i+l 



1 2 



and 

Z(2A - /3i+i)^(A+i) 



< 5, (87) 



< B, 



zm 2 

for all i = 0, .... € — 1. 

Given a 5-Chebyshev cooling schedule of length £ it is relatively easy to produce a 
reversible S-Chebyshev cooling schedule. We do so at the expense of an extra 0((lnn) + 
In In .A) factor in the length of the schedule. We will augment each interval [Pi,Pi+i], % = 
0, ...,£ — 2 by careful initial steps. Let t be the largest integer such that 2 t /n < fii+i — Pi- 
Note that t = 0((lnn) + In In A). We insert the following inverse temperatures between 
Pi and i+1 

Pi + 1/n, A + 2/n, ft + 4/n, . . . , ft + 2 t /n 
For P = Pi and P' = Pi + 1/n we have, by Lemma I4T21 

Z(2/3 -/?)£(/?') 
Z(/3) 2 
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For (3 = fa + 2^/n and (3' = fa + 2^> +1 /n we have 2/3 - fa = fa and fa < fa +1 . Hence 
Z{2p - fa)Z(fa) _ Z{fa)Z{2(3 - fa) Z(fa)Z(2fa +1 - fa) 

z(fa 2 z(j$f ~ z(fa +1 y - ' 

since we started with a S-Chebyshev cooling schedule. For f3 = fa + 2 /n and fa = fa+i 
the argument is the same. 

Theorem 7.5. Let Z be a partition function. Suppose that for every inverse temperature 
(3 we have a Markov chain Mp with stationary distribution Assume that the relaxation 
time of all the Mp chains is uniformly bounded by r 2 . Assume that we can directly sample 
from fj, . 

With probability at least 1—6', we can produce a reversible B-Chebyshev cooling schedule 
p = < fa < • • • < fa_ x < fa = oo, for B = 3 • 10 6 , with 

£ < 38Vln ^l(ln n) (In In A) ((In n) + In In A). 

The algorithm uses at most 

Q< 10 7 (lnA)((lnn) +lnln^) 5 r 2 ln - 

steps of the Mp chains. 
Proof : 

We will run the original algorithm and the augment the schedule using (|89p . To facilitate 
warm starts we will use the non-adaptive cooling schedule 

fa Q = < fa x < ■ ■ ■ < fa v = oo (90) 

of pQ (equation (jlip in this paper). We start with random sample at the inverse tem- 
perature 0, run Mpi for r 2 steps, then Mp> for r 2 steps and so on. This way we ob- 
tain warm starts for all inverse temperatures in the schedule (|9U|) . Note that we made 
0(r2(ln?i) In In A) steps of the chains so far. 

Now we can run algorithm Print-Cooling-Schedule and spend only t 2 steps to 
generate a random sample at any inverse temperature fa We use the closest inverse 
temperature in (|90j) as a warm start. ■ 

Combining Theorem 17.21 with Theorem 17.51 we obtain. 

Corollary 7.6. Let Z be a partition function. Let e > be the desired precision. Suppose 
that for every inverse temperature (3 we have a Markov chain Mp with stationary distri- 
bution [ip. Assume that the relaxation time of all the Mp chains is uniformly bounded by 
r 2 . Assume that we can directly sample from fiQ. Using 

steps of the Mp chains we can obtain a random variable S such that 
P((l - e)Z(oo) < S < (1 + e)Z(oo)) > 3/4. 
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8 Applications 



We detail several specific applications of our work: matchings, Ising model, colorings and 
independent sets. To simplify the comparison of our results with previous work and since 
we have not optimized polylogarithmic factors in our work, we use 0*() notation which 
hides polylogarithmic terms and the dependence on e. Our cooling schedule results in 
a savings of a factor of 0*{n) in the running time for all of the approximate counting 
problems considered here. 



8.1 Matchings 

We first consider the problem of generating a random matching of an input graph G = 
(V,E). Let A = exp(— 1/(3) and let O denote the set of matchings of G. For M G Cl, let 
w{M) = \\ M \ (where 0° = 1). The Gibbs distribution is then fi(M) = w(M)/Z where 
Z = Y2m' w {M'). Note, for (3 = oo (i.e., A = 1), /i is uniform over Q, whereas for [3 = 
(i.e., A = 0), Z = 1 since the empty set is the only matching with positive weight, 

Consider the following ergodic Markov chain with stationary distribution /i. Let 
Xq G Q where w(X ) > 0. From X t G 0, 

• Choose e = (u, v) uniformly at random from E. 

• Set 

X t \e if e G X t 

Xt U e if u and v are unmatched in Xt 

X' = < X t U e \ (v, w) if u is unmatched in Xt and (v, w) G Xt 

J(Ue \ (u, z) if v is unmatched in Xt and (u, z) G Xt 

Xt otherwise 

• Let Xt-\-\ = X' with probability min{l, w(X')/w(Xt)} /2, and otherwise set Xt+i = 

x t . 

Jerrum and Sinclair [8] proved that the above Markov chain has relaxation time 
T2 = 0(nm) (see [6] for the claimed upper bound). 

Since A < n\2 n , using Theorem 17.51 we obtain a cooling schedule of length £ = 
0(y/n log 4 n). In contrast, the previous best schedule was presented by [1] which had 
length 0(n log 2 n). Thus, we save a factor of 0*(n) in the running time for approximating 
Z. Applying Corollary 17.61 we obtain the following result. 

Corollary 8.1. For any G = (V,E), for all e > 0, let M(G) denote the set of matchings 
of G. We can compute an estimate EST such that: 

EST(1 -e)< \M(G)\ < EST {I + e) 

with probability > 3/4 in time 0(n 2 mE~ 2 log 7 n) = 0*(n 2 m). 

Recall, the error probability 3/4 can be replaced by 1 — 5, for any 5 > 0, at the expense 
of an extra factor of 0(log(l/<5)) in the running time. 
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8.2 Spin Systems 



Spin systems are a general class of statistical physics models where our results apply. We 
refer the reader to [13^ [T6] for an introduction to spin systems. The examples we highlight 
here are well-studied examples of spin systems. Recall, the mixing time of a Markov 
chain is the number of transitions (from the worst initial state) to reach within variation 
distance < 5 of the stationary distribution, where < 5 < 1. The following results follow 
in a standard way from the stated mixing time result combined with Corollary 15.21 

Colorings: For a graph G = (V, E) with maximum degree A we are interested in 
approximating the number of ^-colorings of G. Here, we are coloring the vertices using 
a palette of k colors so that adjacent vertices receive different colors. This problem is 
also known as the zero-temperature (thus (3 = oo) anti-ferromagnetic Potts model. The 
simple single-site update Markov chain known as the Glauber dynamics is ergodic with 
unique stationary distribution uniform over all /c-colorings whenever k > A + 2 There are 
various regions where fast convergence of the Glauber dynamics is known, we refer the 
interested reader to [2| for a survey. For concreteness we consider the result of Jerrum [7J 
who proved that the Glauber dynamics has mixing time 0(kn\og(n/5)) whenever k > 2A. 
Moreover, his proof easily extends to any non-zero temperature. (Recall, the mixing time 
of a Markov chain is the number of steps so that, from the worst initial state, we are within 
variation distance < S of the stationary distribution.) Since A = k n , using Corollary 15.21 
we obtain the following result. 

Corollary 8.2. For all k > 0, any graph G = (V,E) with maximum degree A, let £l(G) 
denote the set of k- colorings of G. For all e > 0, whenever k > 2A, we can compute an 
estimate EST such that: 



with probability > 3/4 in time 0{kn 2 e 2 log 6 n) = 0*(n 2 ). 

In comparison, the previous bound p] required 0*(n 3 ) time (and Jerrum [7J required 
0*(nm 2 ) time). 

Ising model: There are extensive results on sampling from the Gibbs distribution 
and approximating the partition function of the (ferromagnetic) Ising model. We refer 
the reader to [13] for background and a survey of results. We consider a particularly 
well-known result. For the Ising model on an y/n x y/n 2-dimensional grid, Martinelli 
and Olivieri [T4] proved that the Glauber dynamics (i. e., single-site update Markov chain) 
has mixing time 0(ralog(n/<5)) for all (3 > j3 c where (3 C is the critical point for the phase 
transition between uniqueness and non-uniqueness of the infinite-volume Gibbs measure. 
In this setting, we have A = 2 n and, hence, we obtain the following result. 

Corollary 8.3. For the Ising model on a \fn x y/n 2-dimensional grid, let Z(j3) denote 
the partition function at inverse temperature (3 > 0. For all e > 0, for all (3 > (3 C , we can 
compute an estimate EST such that: 



EST (I 



e) < MG)\ < EST(l + e) 



EST{1 -e)< Z{(3) < EST{1 + e) 



with probability > 3/4 in time 0(n 2 e 2 log 1 



n) = Q*(n 2 ). 
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Independent Sets: Given a fugacity A > and a graph G = (V, E) with maximum 
degree A, we are interested in computing 

Zo(A) = 5>H 

where Vt is the set of independent sets of G. This is known as hard-core lattice gas 
model. In [15], it was proved that the Glauber dynamics for sampling from the distribution 
corresponding to Zq(X) has 0(nlog(n/8)) mixing time whenever A < 2/(A — 2). As a 
consequence, we obtain the following result. 

Corollary 8.4. For any graph G = (V, E) with maximum degree A, For all e > 0, for 
any A < 2/(A — 2), we can compute an estimate EST such that: 

EST(1 - e) < Z G {\) < EST{\ + e) 

with probability > 3/4 in time 0(n 2 e~ 2 log 6 n) = 0*(n 2 ). 

Note, Weitz [17] has an alternative approach for this problem. His approach approxi- 
mates Zq(X) directly (without using sampling) and holds for a larger range of A (though 
A is required to be constant). 

9 Discussion 

An immediate question is whether these results extend to estimating the permanent of a 
0/1 matrix. Our current adaptive scheme works assuming a sampling subroutine that can 
produce samples at any given temperature (from a warm start). The permanent algorithm 
of [9] also requires a set of n 2 + 1 weights to produce samples from a given temperature. 
These weights are computed from n 2 + 1 partition functions and it appears that a schedule 
of length Q(n) is necessary if one considers all n 2 + 1 partition functions simultaneously. 
In fact, this is the case for the standard bad example of a chain of boxes (or a chain of 
hexagons as illustrated in Figure 2 of [9]). 
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10 


Appendix 






10.1 


Algorithm Pseudocode 








input : A black-box sampler for X ~ fip for any (3 > 0, 






starting inverse temperature po- 








output: A cooling schedule for Z. 








r>au < — y) 








print Po 








if po < m ^4 then 






i 


i < — r IND-IIEAV Y^po, oaaj 






2 


w <— the width of I 

L <— mm{po + l/w,mA); 




(where 1/0 = oo) 


3 


p <— binary search on p £ [po , -^J 
with precision l/(zri), 
using predicate Is-Heavy(/3*, /) 






4 


p <— binary search on p G [po> (p + Po)/2J 
with precision i/(4n), 








using predicate Est(j , po, p)-EST(i , 2p 


-&,/?)< 2000 




if /? < (/?* + /?o)/2 then 








Print-Cooling-Schedule(/3) 




("optimal" move) 




else 








if /? = L then 








Print-Cooling-Schedule(/3) 




("long" move) 




else 








7 <-(/3*-A,)/2 




2 -rinhiXl) 7 




print /3b + 7, A) + (3/2) 7 , /% + (7/4) 7 , . . 


,/3o + (2- 




Bad <- Bad U / 








Print-Cooling-Schedule(/3*) 




("interval" move) 




end 








end 








else 








print oo 








end 







Algorithm 1: Print-Cooling-Schedule 
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